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DIRECTED DESIGN OF EXPERIMENTS FOR 
VALIDATING PROBABILITY OF DETECTION 
CAPABILITY OF A TESTING SYSTEM 

CROSS-REFERENCE TO RELATED 5 

APPLICATIONS 

This application claims priority to and the benefit of U.S. 
Provisional Application 61/053,694 filed on May 16, 2008, 
U.S. Provisional Application 61/109,531 filed on Oct. 30, 10 
2008, and U.S. Provisional Application 61/158,868 filed on 
Mar. 10, 2009, each of which is hereby incorporated by ref- 
erence in its entirety. 

ORIGIN OF THE INVENTION 1 5 

The invention described herein was made by employees of 
the United States Government and may be manufactured and 
used by or for the Government of the United States of 2 o 
America for governmental purposes without the payment of 
any royalties thereon or therefor. 

TECHNICAL FIELD 

25 

The present invention relates generally to the validation of 
a statistics -based testing system, and in particular to a com- 
puter-executed process or method that uses directed design of 
experiments (DOE) to validate the probability of detection 
(POD) capability of such a testing system. 30 

BACKGROUND OF THE INVENTION 

Certain applications may require nondestructive inspec- 
tion evaluation (NDE) of new or used fracture-critical and/or 35 
failure-critical components. For example, in space-based and 
certain aeronautical applications, there may be elevated con- 
cern relating to the use of certain components due to aging 
and/or impact damage of the components. The presence of 
one-of-a-kind or few-of-a-kind critical components having a 40 
limited inspection history and use, and/or that are constructed 
of materials having limited availability, has only enhanced the 
overall inspection concern. 

The determination of the capability of conventional inspec- 
tion systems and methodologies using curve fitting or other 45 
techniques may be insufficient for use with updated and rap- 
idly changing inspection requirements for such systems. For 
example, the National Aeronautics and Space Administration 
(NASA) currently requires on-orbit inspections of the Space 
Shuttle Orb iter’s external thermal protection system. On- 50 
orbit testing is typically performed by trained astronauts as an 
extravehicular activity (EVA). Inspection of fracture-critical 
and failure-critical components requires inspection to be at 
90% probability of detection (POD) with a 95% level of 
confidence, commonly referred to in the art and herein as a 55 
90/95 POD. 

Design of experiments or DOE describes a statistics -based 
process in which changes are made to various input variables 
of a system, and the effects on response variables are mea- 
sured and recorded. DOE may utilize the concept of “point 60 
estimate probability of a hit” or POH at a given “flaw” size, 
with the term “flaw” referring to a physical flaw such as a 
crack in a component when used with physical inspection 
systems. When used with other systems, the term “flaw” may 
refer to any other variable one wishes to inspect for, e.g., 65 
delivery times, flavor levels in a food product, engineering 
properties, etc. 
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The determination of estimated POH at a selected flaw size 
may be a directly measured or observed value between 0 and 
1 . For a single trial, a “miss” is equal to 0 and a “hit” is equal 
to 1 . Knowledge of an estimated POH yields a measure of the 
lower confidence bound, or P z . This process is statistically 
referred to as “observation of occurrences” and is distinct 
from use of functional forms that predict POD. 

Traditionally, binomial distributions have been used for 
determining POD by direct observation of occurrences. Con- 
ventional binomial methodologies use a selection of arrange- 
ments for grouping flaws of similar characteristics. These 
approaches have led to the general acceptance of using the 29 
out of 29 (29/29) binomial point estimate method, in combi- 
nation with validation that the POD is increasing with flaw 
size, in order to meet certain governmental requirements or 
standards, e.g., MSFC-STD-1249, NASA-STD-5009, or 
similar standards. 

SUMMARY OF THE INVENTION 

Accordingly, a method and an apparatus as set forth herein 
provide a cost-effective way to validate the detection capa- 
bility of various inspection or testing systems, with the term 
“validating” as used herein referring to an approval decision 
reflecting that the testing system meets a particular inspection 
requirement or accuracy threshold. The present invention 
works in binomial applications for POD by adding the con- 
cept of a computer-executable lower confidence bound opti- 
mization process as the driver for establishing a POD thresh- 
old, e.g., a 90/95 POD according to one embodiment, or any 
other desired POD threshold such as but not limited to 90/99 
POD, 80/95 POD, etc., depending on the particular applica- 
tion. 

The method and apparatus satisfy the requirement for criti- 
cal applications where validation of inspection or testing 
systems, individual procedures, and ultimate qualification of 
human or robotic operators is required. Additionally, the 
method and apparatus yield an observed estimate of POD 
rather than a predicted estimate of POD, with functionality 
based on the application of the binomial distribution to a set of 
flaws that are automatically grouped into classes having pre- 
determined widths, i.e., class widths. 

The classes are automatically and systematically varied in 
class width using an automatic iteration process and DOE, 
with a host machine processing the input data set as described 
below to determine a data set having an optimal class width. 
In one embodiment the iteration may start at a minimally 
sized class width, e.g., approximately 0.001", and change by 
constant values, e.g., increments of 0.001" up to a maximum 
expected flaw size. Class width groupings may also start at the 
largest expected flaws and move toward the smallest expected 
flaw size. Flaw size may be any flaw dimension such as width, 
height, depth, volume, shape, etc., when used to describe 
physical flaws, or another value such as delivery time, flavor 
level, engineering-quality, etc., for other testing systems not 
concerned with physical flaws, without departing from the 
intended scope of the invention. 

The largest class length in the first class width group may 
be assigned as the identifier in the group. The next moving 
class width group may be identified by decrementing the 
upper and lower class lengths by the constant value, e.g., 
0.001" in one embodiment. The present invention may also 
require for the purposes of validation that the POD increases 
with flaw size within the range of flaw sizes for which the 
results are valid, and may require inclusion of larger flaw 
sizes in the optimization process as set forth hereinbelow. 
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The present invention evaluates the lower confidence 
bound (P z ) obtained from any class width group. If the lower 
confidence bound equals or exceeds 0.90 at any given class 
width group, there exists a grouping of flaws detected at the 
desired 90/95 POD or greater level. Otherwise, such a group- 
ing does not exist. As an output or deliverable product, the 
present invention may provide a detailed set of instructions to 
a user of the testing system for obtaining the desired POD at 
a given or an alternate flaw size. 

In particular, the present invention provides a method and 
apparatus for optimizing the lower confidence bound by 
adjusting the class widths used in the binomial analysis. Once 
the optimized lower confidence bound is determined, the 
input data set is identified as a particular case. After deter- 
mining the case, the test system is either validated to be at the 
threshold inspection capability or the test system is not vali- 
dated to be at threshold inspection capability. If the inspection 
system is not validated to be at the threshold inspection capa- 
bility then instructions are given, that, when executed suc- 
cessfully, yield an inspection system that is at a threshold 
inspection capability or an alternate threshold inspection 
capability, or the inspection system is not capable of demon- 
strating the threshold inspection capability. Additional vali- 
dation at the threshold inspection level is performed to assure 
that the inspection capability is increasing with flaw size, by 
including a number of large flaws in the sample set. The 
capability to include other POD data sets to extend the range 
of validation and to limit the sample requirements to meet 
geometric needs is included. The false call analysis requiring 
a minimum specified number of false call opportunities is 
required to complete all validation and qualifications. 

The above features and advantages and other features and 
advantages of the present invention are readily apparent from 
the following detailed description of the best modes for car- 
rying out the invention when taken in connection with the 
accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a schematic illustration of a host machine config- 
ured for validating a test system in accordance with the inven- 
tion; 

FIG. 2 is a chart describing an input data set of hit/miss data 
for the test system shown in FIG. 1; 

FIG. 3 is a flow chart describing a method that may be 
executed using the host machine shown in FIG. 1; 

FIG. 4 is a chart describing an initial set of observed prob- 
ability of hit (POH) data; 

FIG. 5 is a chart describing a partially optimized set of 
observed POH data; 

FIG. 6 is a chart describing a probability of success in 
determining if the POD of large flaws is less than 90/95 POD 
in an exemplary embodiment; and 

FIG. 7 is a table describing a set of cases useable by the host 
machine of FIG. 1. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

Referring to the drawings, wherein like reference numbers 
represent like components throughout the several figures, and 
beginning with FIG. 1, a validation host machine 10, herein- 
after referred to as the host 10, includes an algorithm 100 
suitable for executing a test system validation method as set 
forth below with reference to FIG. 3 . The host 10 may be used 
in conjunction with a testing system 12 for validating the 
detection capability of the testing system 12 to a predeter- 
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mined threshold, e.g., 90/95 POD or another threshold, and if 
desired, for qualifying an operator or inspector 14 for opera- 
tion of the testing system 12. In FIG. 1, the testing system 12 
is represented as a computer device for simplicity; however, 
5 the testing system 1 2 may include, or may itself be configured 
as, an inspection procedure, e.g., where an inspector uses 
liquid penetrants and spray developers with ultraviolet (UV) 
light and a lOx magnifier, etc. 

Testing system 12 may be, according to one embodiment, 
to a non-destructive inspection and evaluation (NDE) inspec- 
tion system configured for use in the inspection of samples 
16. In such an embodiment, the samples 16 may be physical 
components, and inspection may be performed to identify 
cracks, pits, chips, or other flaws. Those of ordinary skill in 
15 the art will recognize other potential variations of the testing 
system 12 unrelated to inspection of physical components, 
e.g., physical delivery or logistical systems, food flavor or 
other quality sampling, engineering property sampling, etc., 
that are nevertheless statistical in nature, may also be used 
20 within the scope of the invention. However, for simplicity, the 
inspection of samples 16 in the form of physical components 
will be described hereinbelow. 

The testing system 12 and host 10 may be configured as 
microprocessor-based devices having such common ele- 
25 ments as a microprocessor or CPU, memory including but not 
limited to: read only memory (ROM), random access 
memory (RAM), electrically-programmable read-only 
memory (EPROM), etc., and circuitry including but not lim- 
ited to: a high-speed clock (not shown), analog-to-digital 
30 (A/D) circuitry, digital -to-analog (D/A) circuitry, a digital 
signal processor or DSP, and the necessary input/output (I/O) 
devices and other signal conditioning and/or buffer circuitry. 

An inspector 14, e.g., a human inspector or an automated 
inspection device or robot, physically inspects each of the 
35 samples 16 and records the inspection results 20. Samples 16 
may be physical components as noted above such as parts of 
a space vehicle, platform, aircraft, etc., or anything else to be 
inspected. The inspection results 20 may describe the 
observed size of each of the flaws detected by the inspector 
40 14, or the detected amplitude or analog values as noted above. 
When referring to something other than a physical compo- 
nent, the term “flaws” may describe a predetermined varia- 
tion from the expected norm. 

The testing system 12 includes a calibrated data set 18 of 
45 the actual or known flaws contained in components 16. That 
is, the collective set of samples 16 has known flaws and size 
distributions. For example, calibrated data set 18 may be 
determined via direct measurement and/or testing, whether 
nondestructive or destructive, and recorded in memory within 
50 or accessible by the testing system 12. After the data set 18 is 
recorded, the inspector 14 is provided with the samples 16 
and is required to identify each of the known flaws in the 
components 16. 

Referring to FIG. 2, once the inspection results 20 are 
55 recorded by the testing system 12, the test system may auto- 
matically compare the inspection results 20 to the values in 
the calibrated data set 18 to determine whether a hit or miss is 
observed for each test. When used to detect physical flaws, 
flaw length may be the detected value according to one 
60 embodiment, and length is therefore used hereinbelow for 
simplicity even though descriptive values other than length 
may also be used. Alternately, analog values may be entered 
by the inspector 14 with the testing system 12 including a 
threshold, and the testing system 12 or host 10 may compare 
65 the results 20 with the threshold to determine the hit/miss 
results test. The analog threshold may be optimized to pro- 
vide a tradeoff between optimum POD and false call rates. 
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The results of this comparison process is an input data set 
22 as shown in FIG. 1, which provides a record of observation 
of each of the various flaws in the samples 16 arranged or 
organized by flaw size, i.e., the “hits” and “misses”. A POH 
equal to 1 refers to an ob served flaw, as generally indicated by 5 
the series of “hit” data points 30, while a POH equal to 0 refers 
to an unobserved flaw or a “miss” as generally indicated by 
the series of miss data points 32. The testing system 12 may 
then feed or transmit the input data set 22 to the host 10 for 
processing therein using DOE as set forth below, and for 
ultimate validation of the detection capability of the testing 
system 12 and/or qualification of the inspector 14 by the host 
10 using the algorithm 100. 

Before referring to FIG. 3, it is noted that binomial distri- 15 
bution may be used in conjunction with the algorithm 100, 
which will be described below with reference to FIG. 3. 
Binomial distribution, as will be understood by those of ordi- 
nary skill in the art, describes the behavior of a count variable 
(X) if the following conditions apply: (1) the number of 20 
observations (N) is fixed; (2) each observation is indepen- 
dent; (3) each observation represents one of two outcomes, 
i.e., success or failure, e.g., “Hit” or “Miss”, respectively; and 
(4) the probability of a “Hit”, or POH, is the same for each 
outcome. If conditions 1-4 are met, then X has a binomial 25 
distribution. 

Various binomial solutions may lead to a 90/95 POD, e.g., 
a 29/29 binomial solution, a 45/46 binomial solution, a 59/61 
binomial solution, etc., as will be understood by those of 
ordinary skill in the art. In the 59/61 example in particular, 30 
beginning with 61 flaws in the group, each flaw has the same 
probability of being observed as a hit, and 61 observations are 
ultimately made. If 59 hits are observed then the POH is 
5 9/6 1 , or 0. 97 , i .e . , the observed frequency. This value is only 3 5 
an estimated POH, as the true POH can only be approached by 
making an infinite number of observations, which is a prac- 
tical impossibility. The uncertainty in the measurements or 
the confidence in the POH is another value to be ascertained. 
The term “confidence level” describes the measure of prob- 40 
ability associated with a confidence interval expressing the 
probability of truth of a statement that the interval will include 
the parameter value. For NDE applications of the type used in 
space-based, military, or similar critical applications the con- 
fidence bound of interest for POH is the lower confidence 45 
bound. 

In the example of 90/95 POD, which may be imagined as a 
bell curve having a 90% lower confidence limit, if the lower 
limit (P z ) is 0.9 there is a 95% chance that the true POH is 
greater than 90% for that particular flaw size. That is, return- 50 
ing to the 59/61 example, withX=59 hits after N=61 trials, the 
POH is 59/61 or 0.97 as noted above. The lower confidence 
bound, or P i5 may be obtained using the following statistical 
equation: 


P L = (X) / X + (N - X + l)F a (fi, fz). 

(1) 

where 

(/i = 2(N - X + 1) = 6 1 

(2) 

*,(/,,/>) = 2.25-1 /2 = 2x = u8 }, 


P z =0.9, and a is, a priori, the confidence level of 95% 
required of the function F ol (f 1 , f 2 ) which may be obtained 
from an F -distribution statistical table. Note that the POH 65 
does not change if the confidence level is changed. This 
confidence bound procedure has a probability of at least 0.95 
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to give a lower bound for the 90% POD point that exceeds a 
true but still unknown 90% POD point. 

Referring to FIG. 3 , the host 10 of FIG. 1 is configured for 
executing the algorithm 100 to thereby qualitatively evaluate 
the POD capability of a statistics-based test system such as 
the testing system 12, which in one embodiment may be 
configured as an NDE system as noted previously herein- 
above. That is, host 10 provides validation of testing that 
demonstrates whether the identification of X POD , i.e., the 
90/95 POD flaw size, without false call or large flaw warnings 
as explained below, and with explanation and resolution of 
any misses above X POZ5 , qualifies that the inspection system 
performs adequately, and that there is 95% confidence that the 
POD is greater than 90% (90/95 POD) at and above X POD . 

Beginning with step 102 , host 10 receives and records the 
set of input data 22 from the testing system 12 . Input data 22 
may be the hit/miss data shown in FIG. 2 , but may also be 
analog input data having a threshold value. For example, the 
inspector 14 may be told to record an ultrasonic signal ampli- 
tude of 0.5 V, IV, 2V, 3 V, etc. and any amplitudes greater than 
a calibrated threshold, e.g., 2V, may be identified by the host 
10 as being “hits”, with all other values being “misses”. 

Step 102 may also include performing false call analysis, 
and recording the results of the false call analysis. As will be 
understood in the art, false call analysis involves providing 
flawless samples to an inspector 14 . When the inspector 14 
finds a flaw that is not actually present, this result is referred 
to as a false call, and is much akin to a radiologist reading an 
X-ray film and finding an abnormality when none is present, 
i.e., a false positive. Statistically, at least 84 false call samples 
or false call inspection opportunities should be provided and 
the results recorded in a memory location accessible by the 
host 10 to ensure proper validation results. After completing 
step 102 , the algorithm 100 proceeds to step 104 . 

At step 104 , the host 10 processes the input data set 22 to 
determine an optimal class width. Generally, the host 10 
automatically groups similar size flaws together to optimize 
the class width to identify the optimum lower confidence 
bound. Referring briefly to FIG. 4 , in a first iteration having a 
class width of 0.001" the processed input data set 22 appears 
with almost all of the POH data points 42 having a 100% 
POH, with one data point 40 having a 50% POH. The lower 
confidence limit (P z ) is also plotted at 44 . Note that the data 
point having the highest i.e., point 46 , has a confidence 
bound of less than 5%. The class width of the data set repre- 
sented in FIG. 4 is therefore less than optimal. 

Referring to FIG. 5 , after numerous iterations that change 
the class width from 0.001" in FIG. 4 to a more optimal class 
width of 0.100" in FIG. 5 , data points 52 have a 100% POH, 
data points 50 have a POH of between 85% and 50%, and one 
data point 55 has a POH of 25%. However, the highest V L of 
the data points 56 is approximately 42.5%. While vastly 
improved from the confidence bound value of 5% of data 
point 46 in FIG. 4 , the highest confidence remains a far cry 
from the required 90/95 level in this example. Further itera- 
tions are therefore required, again with each successive itera- 
tion increasing the class width by predetermined constant 
values, e.g., 0.001". Alternately, class widths may be allowed 
to vary in size as part of the optimization process rather than 
applying constant values, which may provide a more rapid 
response. With each iteration, note that the intermediate data 
sets such as those shown in FIGS. 4 and 5 are recorded in 
memory of the host 10, as the optimal class width configura- 
tion may not be discemable by the host 10 until each iteration 
is completed out to the largest expected flaw size. That is, 
increases in class width do not necessarily lead to a more 
optimal class width. 
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Referring again to FIG. 3, step 104 may also require the 
inclusion of a predetermined number of “large” flaws for 
proper validation of the testing system 12. While the assump- 
tion that a testing system accurately detecting a smaller flaw 
will naturally detect a larger flaw, the reality may be quite 
different. Statistically, at least 29 similar flaws are required at 
the target flaw size for supporting validation of certain exist- 
ing test systems, e.g., conventional NDE systems, while at 
least 25 additional large flaws uniformly distributed between 
the target flaw size and the largest expected flaw size are 
required for validating new systems, i.e., to ensure that 90/95 
POD is provided for the large flaws. 

Referring briefly to FIG. 6 , table 60 shows the origin of the 
statistical threshold of 25 large flaws as noted above. Note 
that until at least 25 large flaws are included in the analysis, 
the LCL remains below 0.90. A high confidence zone 62 is 
achieved at or above 25 large flaw samples. To ensure the 
integrity of the validation process, therefore, at least 25 large 
flaws should always be included in the output data set 22 . 

Referring again to FIG. 3, once the optimal class width is 
determined at step 104 the algorithm 100 proceeds to step 
106. 

At step 106, one of a plurality of predetermined case num- 
bers is assigned to the optimal data set. Referring briefly to 
FIG. 7, a table 70 presents a representative set of such cases 
72. In column 74, the table 70 records whether or not 90/95 
POD is reached at X POD , i.e., whether the lower confidence 
bound is equal to or greater than 0.9. Column 75 may record 
whether there is a class length, X POH , for which POH is equal 
to 1 (100%) everywhere greater than X POH . Class length 
describes a point or length at which a particular class width is 
attached, e.g., a class width of 0 . 1 0 " that may contain all flaws 
ranging from 0.9" to the 1" class length or flaw size. Column 
76 records whether POH is equal to 1 (100%) everywhere 
greater than the class length with the optimal lower confi- 
dence bound. Column 77 records whether X POH is less than or 
equal to X z /3, where X L describes the largest flaw in the data 
set. Column 78 records whether large flaw validation is com- 
plete. 

Finally, column 79 provides a detailed analysis and recom- 
mendations for validating the testing system 12 when the 
assigned case from column 72 results in a failed validation. 
For example, in the example shown in FIG. 7 cases 1 and 1+ 
reflect passing validations. Column 79 entries for each of 
cases 1 and 1+ therefore reports that: (a) 90/95 POD has been 
reached, which is the goal of the validation effort; (b) false 
call warning should be addressed, if any; and (c) any other 
actions that should be resolved. 

Cases 1 through 7 and the survey case represent a failed 
validation. Rather than simply failing, however, the algorithm 
100 and host 10 provide a detailed report to a user of the 
system 12 on the precise steps needed for achieving a passing 
result, e.g., cases 1 and 1+. While table 70 provides one 
possible tabular solution, those of ordinary skill in the art will 
recognize that other tables may be used, with different case 
numbering and total case numbers, depending on the particu- 
lar design of the host 10 . 

Referring again to FIG. 3, after assigning the case number 
at step 104, the algorithm 100 generates output instructions 
24, which may be transmitted or otherwise provided to the 
testing system 12 or a user thereof. Instructions 24 may be a 
detailed report of findings including the precise recommen- 
dations from column 79 of the table 70 of FIG. 7. The instruc- 
tions 24 may be displayed by the host 10, printed as a deliv- 
erable report by the host 10 , or transmitted or delivered in a 
digital format for display and/or printing as desired by a user 
of testing system 12. Using the instructions 24, a user of a 
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failed testing system may follow the instructions 24 and 
repeat the validation testing until the testing system 12 is 
validated at the 90/95 POD level or there are recommenda- 
tions that the testing system 12 will not ever meet threshold 
5 inspection requirements. The algorithm 100 is then finished. 
While 90/95 POD is referred to throughout, the actual POD 
requirement may vary without departing from the intended 
scope of the invention. For example, while 90/95 POD is 
expected for certain critical applications, e.g., space andaero- 
10 nautic applications, for other applications different POD 
thresholds such as 90/99 or 80/95 may be more appropriate. 
The apparatus and method of the invention are equally well 
suited to validating testing system 12 of FIG. 1 to POD 
1 5 requirements other than 90/95 POD. Likewise, the term POD 
as used herein refers to the embodiment used for inspection of 
physical components for flaws such as cracks or fractures. 
The apparatus and method may also be used to determine the 
probability of a non-detection result, such as probability of 
20 on-time delivery of a package when used with a logistical 
testing system. In this case the term POD may be replaced by 
an appropriate term, with the operation of the apparatus and 
method otherwise being substantially as set forth above. 

The present invention is further discussed in Generazio, 
25 Edward R., Directed Design of Experiments (DOE) for 
Determining Probability of Detection (POD) Capability of 
NDE Systems (DOEPOD), 34th Annual Review of Progress 
in Quantitative Nondestructive Evaluation (QNDE 2007), 
presentation, July 2007; Generazio, Edward R., Directed 
30 Design of Experiments for Validating Probability of Detec- 
tion Capability of NDE Systems (DOEPOD), AIP Confer- 
ence Proceedings, 2008, Volume 975, pp. 1693-1700; and 
Generazio, Edward R., Directed Design of Experiments 
(DOE) for Determining Probability of Detection (POD) 
35 Capability of NDE Systems (DOEPOD), 50th Annual Air 
Transportation Association (ATA) Non-Destructive Testing 
(NDT) Forum, presentation, August 2007; all incorporated 
herein by reference in their entirety. 

While the best modes for carrying out the invention have 
40 been described in detail, those familiar with the art to which 
this invention relates will recognize various alternative 
designs and embodiments for practicing the invention within 
the scope of the appended claims. 

45 What is claimed as new and desired to be secured by 
Letters Patent of the United States is: 

1. A method of validating the performance of a statistical 
testing system using directed design of experiments (DOE), 
the method comprising: 

50 recording an input data set in a memory location that is 
accessible by a host machine, wherein the input data set 
describes a set of observed probability of hit (POH) data 
for a plurality of samples as a function of a characteristic 
of the samples; 

55 processing the input data set using the host machine to 
thereby generate an output data set having an optimal 
class width; 

assigning a case number to the output data set; and 
generating a set of instructions using the assigned case 
60 number; 

wherein the instructions validate the performance of the 
testing system when the assigned case number equals a 
predetermined case number, and wherein the instruc- 
tions inform a user of the testing system regarding 
65 required steps for validating the testing system when the 
assigned case number does not equal the predetermined 
case number. 
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2. The method of claim 1, wherein recording an input data 
set includes recording a flaw in the samples as the character- 
istic. 

3. The method of claim 1, including using a processor to 
automatically calculate a lower confidence bound using the 5 
POH data, and recording the lower confidence bound using 
the host machine for use in processing the input data set. 

4. The method of claim 1, wherein processing the input 
data set continues until a threshold POD and a lower confi- 
dence bound is reached for all samples within a given class 
width. 

5. The method of claim 1, wherein processing the input 
data set includes generating a first intermediate data set for a 
first class width, increasing the size of the first class width by 15 
a constant value to produce a second class width, and then 
generating an additional intermediate data set for each of a 
plurality of different class widths. 

6. The method of claim 1, wherein recording the input data 
set includes recording the results of a false call analysis pro- 20 
cedure of a calibrated number of false call samples. 

7. A method of validating a probability of detection (POD) 
testing system, the method comprising: 

recording an input data set in a memory location accessible 
by a host machine, the input data set describing a set of 25 
observed data for a plurality of sample components as a 
function of size of a flaw in the sample components; 

using the host machine to generate an output data set hav- 
ing an optimal class width, wherein the host machine 
generates the output data set using an algorithm that 30 
automatically processes the input data set through mul- 
tiple class width iterations using directed design of 
experiments (DOE); 

selecting a case number from a plurality of predetermined 
cases using the output data set having the optimal class 35 
width; and 

generating a set of instructions based on the selected case, 
including at least one of displaying the set of instructions 
on a display screen, transmitting an electronic copy of 
the instructions to a remote system, and printing a report; 40 

wherein the content of the instructions corresponds to the 
selected case number and validates the testing system 
only when the selected case is equal to a predetermined 
ease number. 

8. The method of claim 7, wherein the set of observed data 45 
is at least one of: a set of hit and miss data, a set of analog data 
with a corresponding threshold, and a set of false call data. 

9. The method of claim 7, wherein using a host machine to 
generate an output data set having an optimal class width 
includes using a binomial solution to generate a data set 50 
having a POD that is a 90/95 POD. 

10. The method of claim 7, further comprising: 

receiving a second input data set using the host machine; 

and 

processing the second input data set using the host machine 55 
to determine the output data set; 


wherein the second input data set is an additional set of 
POD data that is generated by one of the testing system 
and a device that is external to the testing machine. 

11. The method of claim 7, further comprising: 
recording a dimensional limitation of the components; and 
automatically limiting the number of tests required at large 

class lengths using the dimensional limitation. 

12. The method of claim 7, wherein recording the input 
data set includes recording the results of a false call analysis 
procedure of at least 84 false call samples or false call test 
opportunities. 

13 . The method of claim 7, wherein using a host machine to 
generate an output data set having an optimal class width 
includes processing at least 25 large flaws, wherein the large 
flaws are flaws that are uniformly distributed between a target 
flaw size Q^ PO d) of the optimal class width and the largest 
expected flaw size. 

14. An apparatus adapted for validating the probability of 
detection (POD) capability of a statistics -based testing sys- 
tem, the apparatus comprising: 

a host machine having a processor and a memory location, 
wherein the host machine in communication with the 
testing system and adapted for receiving an input data set 
from the testing system and recording the input data set 
in the memory location, and wherein the input data set 
describes a set of observed probability of hit (POH) data 
for a plurality of sample components as a function of size 
of a flaw in the components; and 
an algorithm for applying a directed design of experiments 
(DOE) to the input data set to thereby validate the per- 
formance of the testing system; 
wherein the algorithm is executed via the processor to: 
apply the DOE to the input data set to determine a data set 
having an optimal class width; 
assign a case number to the data set having the optimal 
class width; and 

generate a set of instructions having a validation result that 
is based on the case number. 

15. The apparatus of claim 14, wherein the testing system 
is configured as a non-destructive evaluation inspection 
(NDE). 

16. The apparatus of claim 14, wherein the set of instruc- 
tions includes detailed steps describing how a user of the 
testing system may achieve a passing validation result for the 
testing system. 

17. The apparatus of claim 16, wherein the set of instruc- 
tions includes detailed steps describing how to qualify an 
inspector for use of the testing system once the testing system 
has been validated. 

18. The apparatus of claim 14, wherein the algorithm deter- 
mines the optimal class width by iterating the class width by 
predetermined constant values starting with a minimum class 
width and continuing to a maximum expected flaw size, and 
by selecting the data set having a threshold POD for all 
samples within that particular class width. 





